First, I load in the four data sets that I have saved to my repository as csv files. These were converted to comma separated value (csv) from their original excel version as R can only import data via csv.
library(curl)
## Using libcurl 7.64.1 with Schannel
f <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/fGC%20and%20daily%20travel%20distance%20data.csv")
fgcdata <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE)
names(fgcdata)<-c('ID','FGC',"DTD","LogFGC")
head(fgcdata)
## ID FGC DTD LogFGC
## 1 1 66.6 3.9 1.823474
## 2 1 74.0 4.7 1.869232
## 3 1 84.5 5.2 1.926857
## 4 1 97.0 5.3 1.986772
## 5 1 126.8 4.9 2.103119
## 6 1 106.6 5.0 2.027757
d <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/Annual%20home%20range%20and%20foraging%20predictors.csv")
annualdata <- read.csv(d, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(annualdata)
## X.U.FEFF.Group.ID Hydrological.year Average.group.size
## 1 1 2000 40.4
## 2 1 2001 35.5
## 3 1 2002 36.3
## 4 1 2003 40.7
## 5 1 2004 41.2
## 6 1 2005 43.8
## Average.group.size.squared Cumulative.rainfall..mm. Rainfall.evenness
## 1 1632.16 188.4 0.38
## 2 1260.25 517.1 0.60
## 3 1317.69 270.4 0.74
## 4 1656.49 331.5 0.60
## 5 1697.44 221.2 0.73
## 6 1918.44 345.5 0.65
## Average.maximum.temperature..degrees.C. Proportion.time.foraging
## 1 34.2 0.62
## 2 33.5 0.52
## 3 32.0 0.61
## 4 32.4 0.69
## 5 32.5 0.64
## 6 33.1 0.62
## Annual.home.range.size..sq.km. Log.Annual.Home.Range.Size
## 1 23.1 1.363612
## 2 13.6 1.133539
## 3 12.0 1.079181
## 4 13.8 1.139879
## 5 5.6 0.748188
## 6 10.5 1.021189
g <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/Monthly%20ranging%20pattern%20predictors.csv")
monthlyhrdata <- read.csv(g, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(monthlyhrdata)
## X.U.FEFF.Group.ID Temporal.sequence Group.size Cumulative.rainfall..mm.
## 1 1 1 52 4.4
## 2 1 2 51 0.0
## 3 1 3 51 0.0
## 4 1 4 51 0.0
## 5 1 5 51 0.0
## 6 1 6 52 0.0
## Average.maximum.temperature..degrees.C. Evenness.of.space.use
## 1 33.2 3.77
## 2 33.3 4.18
## 3 30.1 4.44
## 4 30.1 4.20
## 5 31.1 4.20
## 6 33.8 3.92
## Monthly.home.range.size..sq.km. Average.daily.travel..km.
## 1 4.4 3.9
## 2 7.6 4.7
## 3 10.5 5.2
## 4 8.9 5.3
## 5 8.8 4.9
## 6 9.0 5.0
h <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/Monthly%20fGC%20predictors.csv")
monthlyfgcdata <- read.csv(h, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(monthlyfgcdata)
## X.U.FEFF.Group.ID Temporal.sequence Group.size Cumulative.rainfall..mm.
## 1 1 1 42 5.2
## 2 1 2 44 0.0
## 3 1 3 44 4.4
## 4 1 4 43 1.0
## 5 1 5 43 0.0
## 6 1 6 37 0.6
## Average.maximum.temperature..degrees.C. fGC..g.ng.dry.feces.
## 1 34.5 55.4
## 2 37.1 76.9
## 3 37.9 67.3
## 4 36.6 65.9
## 5 34.4 91.4
## 6 31.9 84.1
Now, I will be plotting the bivariate relationships using the lm function as well as the quadratic method supplied by stat_smooth in {ggplot2}. This was my first hurdle in this replication. The authors do not specify how these plots were generated or in what way they were modeled, as they only discuss the GEE portion of their analysis. In addition, many of the steps they take for their GEE analysis including logging and transforming are confusingly not done here. However, it is clear that they used a similar quadratic method as my coefficients are nearly exactly the same. I will first depict how I created and plotted the first model as the rest are done in very much the same way.
library(ggplot2)
library(gridExtra)
AnnualPlotlm<- lm(Annual.home.range.size..sq.km. ~ Average.group.size + I(Average.group.size^2), data = annualdata)
summary(AnnualPlotlm)
##
## Call:
## lm(formula = Annual.home.range.size..sq.km. ~ Average.group.size +
## I(Average.group.size^2), data = annualdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4178 -2.6140 -0.5083 2.1303 9.2804
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.5388006 3.0768988 7.325 1.5e-09 ***
## Average.group.size -0.3455007 0.1135571 -3.043 0.003673 **
## I(Average.group.size^2) 0.0033660 0.0009517 3.537 0.000861 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.783 on 52 degrees of freedom
## Multiple R-squared: 0.2429, Adjusted R-squared: 0.2138
## F-statistic: 8.342 on 2 and 52 DF, p-value: 0.0007209
AnnualPlot <- ggplot(data = annualdata, aes(x = Average.group.size, y = Annual.home.range.size..sq.km.))
AnnualPlot <- AnnualPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Home Range Size") + theme_classic() + xlim(10, 110) + annotate("text",x=60, y=30, label="Annual: y = 0.003x^2 - 0.346x + 22.539", size=3.5)
AnnualPlot
Here, I first construct a linear model using lm where I obtain the coefficients to be later added over the plot. To create a quadratic lm, I utilize the formula “Average.group.size + I(Average.group.size^2)” for the predictor variable of group size.
When creating the plot, I utilized the function stat_smooth with the specification of a quadratic model. In addition, fullrange=TRUE was selected as an option to expand the plotted model past the automatically set x limits so that I can fit the plot to the same specifications as done in the study. The formula is overlaid via the function “annotate” which is similar to geom_text but does not cause issues such as text blurring.
MonthlyPlotlm<- lm(Monthly.home.range.size..sq.km. ~ Group.size + I(Group.size^2), data = monthlyhrdata)
summary(MonthlyPlotlm)
##
## Call:
## lm(formula = Monthly.home.range.size..sq.km. ~ Group.size + I(Group.size^2),
## data = monthlyhrdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4488 -2.5341 -0.7023 0.9732 13.0310
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.019301 2.093012 9.565 2.31e-16 ***
## Group.size -0.416736 0.072892 -5.717 8.43e-08 ***
## I(Group.size^2) 0.003656 0.000570 6.414 3.12e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.969 on 117 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.3019, Adjusted R-squared: 0.2899
## F-statistic: 25.29 on 2 and 117 DF, p-value: 7.425e-10
MonthlyPlot <- ggplot(data = monthlyhrdata, aes(x = Group.size, y = Monthly.home.range.size..sq.km.))
MonthlyPlot <- MonthlyPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Monthly Home Range Size") + theme_classic() + xlim(10, 110) + annotate("text",x=60, y=22.5, label="Monthly: y = 0.004x^2 - 0.417x + 20.019", size=3.5)
MonthlyPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
BothPlot <- AnnualPlot + stat_smooth(aes(x = Group.size, y = Monthly.home.range.size..sq.km.), data = monthlyhrdata,
method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE, color = "red")
BothPlot <- BothPlot + annotate("text",x=60, y=28, label="Monthly: y = 0.004x^2 - 0.417x + 20.019", size=3.5) + annotate("text",x=60, y=17, label="Annual", size=4) + annotate("text", x=80, y=7, label="Monthly", size=4)
BothPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
DailyTravelPlotlm<- lm(Average.daily.travel..km. ~ Group.size + I(Group.size^2), data = monthlyhrdata)
summary(DailyTravelPlotlm)
##
## Call:
## lm(formula = Average.daily.travel..km. ~ Group.size + I(Group.size^2),
## data = monthlyhrdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5079 -0.6564 -0.2198 0.4614 3.2672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.2488043 0.4936011 12.660 < 2e-16 ***
## Group.size -0.0557020 0.0171903 -3.240 0.001555 **
## I(Group.size^2) 0.0004749 0.0001344 3.533 0.000589 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.936 on 117 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1065, Adjusted R-squared: 0.09121
## F-statistic: 6.972 on 2 and 117 DF, p-value: 0.001378
DailyTravelPlot <- ggplot(data = monthlyhrdata, aes(x = Group.size, y = Average.daily.travel..km.))
DailyTravelPlot <- DailyTravelPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2),fullrange=TRUE) + xlab("Group Size") + ylab("Daily Travel Distance (km)") + theme_classic() + annotate("text",x=60.5, y=6.4, label="Monthly: y = 0.000475x^2 - 0.056x + 6.249", size=3.5) + annotate("text",x=60, y=5.2, label="Monthly", size=4) + xlim(10,110)
DailyTravelPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
EvennessPlotlm<- lm(Evenness.of.space.use ~ Group.size + I(Group.size^2), data = monthlyhrdata)
summary(EvennessPlotlm)
##
## Call:
## lm(formula = Evenness.of.space.use ~ Group.size + I(Group.size^2),
## data = monthlyhrdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60567 -0.14798 -0.00755 0.14147 0.63359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.709e+00 1.302e-01 36.169 < 2e-16 ***
## Group.size -2.085e-02 4.534e-03 -4.600 1.08e-05 ***
## I(Group.size^2) 1.811e-04 3.545e-05 5.107 1.28e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2469 on 117 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.209, Adjusted R-squared: 0.1955
## F-statistic: 15.46 on 2 and 117 DF, p-value: 1.103e-06
EvennessPlot <- ggplot(data = monthlyhrdata, aes(x = Group.size, y = Evenness.of.space.use))
EvennessPlot <- EvennessPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Evenness of space use") + theme_classic() + annotate("text",x=60, y=4.9, label="Monthly: y = 0.0002x^2 - 0.020x + 4.710", size=3.5) + annotate("text", x=60, y=4.35, label="Monthly", size=4) + xlim(10,110)
EvennessPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
FGCPlotlm<- lm(fGC..g.ng.dry.feces. ~ Group.size + I(Group.size^2), data = monthlyfgcdata)
summary(FGCPlotlm)
##
## Call:
## lm(formula = fGC..g.ng.dry.feces. ~ Group.size + I(Group.size^2),
## data = monthlyfgcdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.130 -12.515 -2.124 8.259 186.682
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.314322 5.323753 18.279 < 2e-16 ***
## Group.size -0.785226 0.194889 -4.029 6.28e-05 ***
## I(Group.size^2) 0.006270 0.001622 3.866 0.000122 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.41 on 629 degrees of freedom
## Multiple R-squared: 0.02537, Adjusted R-squared: 0.02227
## F-statistic: 8.185 on 2 and 629 DF, p-value: 0.0003095
FGCPlot <- ggplot(data = monthlyfgcdata, aes(x = Group.size, y = fGC..g.ng.dry.feces.))
FGCPlot <- FGCPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Fecal glucocorticoid (ng/g dry feces)") + theme_classic() + annotate("text",x=60, y=97, label="Monthly: y = 0.006x^2 - 0.785x + 97.314", size=3.5) + xlim(10,110)+annotate("text", x=60, y=81.5, label="Monthly", size=4)
FGCPlot
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
grid.arrange(BothPlot, DailyTravelPlot, EvennessPlot, FGCPlot, nrow = 2)
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
As you can see, the results from my models accurately depict those done by the authors. I overlaid the monthly home range plot over the annual homerange plot by simply adding to the previously created object using stat_smooth. I also utilized the function grid.arrange from the package{gridExtra}to combine each of the four plots.
Next, I will be created the linear models for the two relationships the authors assessed to be completely linear. This was the simplest and easiest part of the replication. I utilized the lm function again, but this time without any quadratic equation. I set the x and y limits of the graph to match that of those in the paper using xlim and ylim. Once again, the equation was overlaid via annotate.
library(ggplot2)
lm<- lm(LogFGC ~ DTD, data = fgcdata)
summary(lm)
##
## Call:
## lm(formula = LogFGC ~ DTD, data = fgcdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.23746 -0.04868 -0.00916 0.05491 0.40048
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.804184 0.043150 41.812 <2e-16 ***
## DTD 0.020833 0.008535 2.441 0.016 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09204 on 129 degrees of freedom
## Multiple R-squared: 0.04415, Adjusted R-squared: 0.03674
## F-statistic: 5.958 on 1 and 129 DF, p-value: 0.01601
lm <- ggplot(data = fgcdata, aes(x = DTD, y = LogFGC))
lm<- lm + geom_point()
lm<- lm + geom_smooth(method = "lm", formula = y ~ x,fullrange=TRUE) + xlab("Daily Travel Distance (km)") + ylab("Log Fecal Glucocorticoid (ng/g dry feces)") + theme_classic() + xlim(3, 9) + ylim(1.6, 2.4) + annotate("text",x=6, y=2.375, label="y = 0.02x + 1.80", size=3.5)
lm
Figure2
Once again, my model is nearly identical to that of the original linear model.
lm2<- lm(Proportion.time.foraging ~ Average.group.size, data = annualdata)
summary(lm2)
##
## Call:
## lm(formula = Proportion.time.foraging ~ Average.group.size, data = annualdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.146158 -0.034333 0.004629 0.041591 0.081504
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6094885 0.0175998 34.630 < 2e-16 ***
## Average.group.size 0.0015963 0.0003044 5.244 2.8e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04942 on 53 degrees of freedom
## Multiple R-squared: 0.3416, Adjusted R-squared: 0.3292
## F-statistic: 27.5 on 1 and 53 DF, p-value: 2.798e-06
lm <- ggplot(data = annualdata, aes(x = Average.group.size, y = Proportion.time.foraging))
lm <- lm + geom_point()
lm <- lm + geom_smooth(method = "lm", formula = y ~ x,fullrange=TRUE) + xlab("Group Size") + ylab("Proportion time spent foraging") + theme_classic() + xlim(10, 110) + ylim(0.50, 0.85) + annotate("text",x=61, y=0.8375, label="y = 0.002x + 0.609", size=3.5)
lm
Figure3
The same is also the case here.
Here, I will be performing the main portion of their analysis, which was assessing the relationship between each of the data sets (Annual Home Range Size, Monthly Home Range Size, Daily Travel Distance, and Fecal Glucocorticoid Concentrations) and group size. This was also a challenge, as their descriptions of how they ran their GEE models were somewhat vague. Because the GEE method using in SPSS is likely not the same used here via {geepack}, I was not able to obtain exactly the same coefficients. More importantly, however, my models acchieved significant p-values for the same values as those in the study. As before, I will depict how the first model was run as the rest were created in essentially the same way.
library(gee)
## Warning: package 'gee' was built under R version 4.1.2
library(geepack)
## Warning: package 'geepack' was built under R version 4.1.2
geeAnnual<- geeglm(formula = log(Annual.home.range.size..sq.km.) ~ Average.group.size + I(Average.group.size^2) + Cumulative.rainfall..mm. + Rainfall.evenness + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = annualdata, corstr = "ar1")
summary(geeAnnual)
##
## Call:
## geeglm(formula = log(Annual.home.range.size..sq.km.) ~ Average.group.size +
## I(Average.group.size^2) + Cumulative.rainfall..mm. + Rainfall.evenness +
## Average.maximum.temperature..degrees.C., data = annualdata,
## id = X.U.FEFF.Group.ID, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 6.869e+00 3.379e+00 4.132 0.04208
## Average.group.size -1.474e-02 4.978e-03 8.768 0.00306
## I(Average.group.size^2) 1.628e-04 3.663e-05 19.753 8.81e-06
## Cumulative.rainfall..mm. 8.379e-05 1.106e-04 0.574 0.44851
## Rainfall.evenness -1.463e+00 5.278e-01 7.682 0.00558
## Average.maximum.temperature..degrees.C. -9.229e-02 9.057e-02 1.038 0.30820
##
## (Intercept) *
## Average.group.size **
## I(Average.group.size^2) ***
## Cumulative.rainfall..mm.
## Rainfall.evenness **
## Average.maximum.temperature..degrees.C.
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 0.05268 0.009466
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.6203 0.0931
## Number of clusters: 5 Maximum cluster size: 11
As you can see, I encorporated each of the variables utilized by the authors using a quadratic GEE via geeglm. The annual home range size was logged, as discussed by the authors. Id was controlled for by the GEE as a repeated measure using “id = X.U.FEFF.Group.ID”. Temporal autocorrelation control was implemented via “corstr =”ar1"". The coefficients are not the same as those in the study (which you will see below) but the model has achieved significance for the same variables as that of the authors’.
geeMonthlyHr<- geeglm(formula = log(Monthly.home.range.size..sq.km.) ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyhrdata,
corstr = "ar1")
summary(geeMonthlyHr)
##
## Call:
## geeglm(formula = log(Monthly.home.range.size..sq.km.) ~ Group.size +
## I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C.,
## data = monthlyhrdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 2.72e+00 6.50e-01 17.49 2.9e-05 ***
## Group.size -3.66e-02 8.24e-03 19.75 8.8e-06 ***
## I(Group.size^2) 3.17e-04 7.02e-05 20.36 6.4e-06 ***
## Cumulative.rainfall..mm. -1.82e-04 5.98e-04 0.09 0.76
## Average.maximum.temperature..degrees.C. 1.22e-02 1.71e-02 0.51 0.48
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 0.0842 0.0154
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.556 0.0602
## Number of clusters: 5 Maximum cluster size: 24
geeMonthlyDT<- geeglm(formula = log(Average.daily.travel..km.) ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyhrdata,
corstr = "ar1")
summary(geeMonthlyDT)
##
## Call:
## geeglm(formula = log(Average.daily.travel..km.) ~ Group.size +
## I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C.,
## data = monthlyhrdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 1.94e+00 2.73e-01 50.49 1.2e-12 ***
## Group.size -8.93e-03 5.00e-03 3.19 0.074 .
## I(Group.size^2) 8.06e-05 4.07e-05 3.92 0.048 *
## Cumulative.rainfall..mm. -3.20e-04 1.69e-04 3.61 0.057 .
## Average.maximum.temperature..degrees.C. -5.55e-03 6.01e-03 0.85 0.355
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 0.0293 0.00481
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.682 0.0732
## Number of clusters: 5 Maximum cluster size: 24
geeMonthlyEvenness<- geeglm(formula = Evenness.of.space.use ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyhrdata,
corstr = "ar1")
summary(geeMonthlyEvenness)
##
## Call:
## geeglm(formula = Evenness.of.space.use ~ Group.size + I(Group.size^2) +
## Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C.,
## data = monthlyhrdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 4.83e+00 6.06e-01 63.44 1.7e-15 ***
## Group.size -2.20e-02 6.93e-03 10.08 0.00150 **
## I(Group.size^2) 1.93e-04 5.82e-05 10.96 0.00093 ***
## Cumulative.rainfall..mm. 1.51e-04 4.15e-04 0.13 0.71581
## Average.maximum.temperature..degrees.C. -3.98e-03 1.52e-02 0.07 0.79320
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 0.0607 0.00908
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.554 0.0419
## Number of clusters: 5 Maximum cluster size: 24
geeMonthlyFGC<- geeglm(formula = log(fGC..g.ng.dry.feces.) ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyfgcdata,
corstr = "ar1")
summary(geeMonthlyFGC)
##
## Call:
## geeglm(formula = log(fGC..g.ng.dry.feces.) ~ Group.size + I(Group.size^2) +
## Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C.,
## data = monthlyfgcdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 4.58e+00 1.61e-01 812.08 <2e-16 ***
## Group.size -8.32e-03 3.09e-03 7.25 0.0071 **
## I(Group.size^2) 6.83e-05 1.99e-05 11.79 0.0006 ***
## Cumulative.rainfall..mm. 2.32e-04 1.84e-04 1.59 0.2069
## Average.maximum.temperature..degrees.C. -2.29e-03 3.91e-03 0.34 0.5573
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 0.0698 0.00419
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.488 0.0621
## Number of clusters: 5 Maximum cluster size: 128
Finally, I will be creating a summary table much like that in the study which will include all of the relevant outputs of my GEEs such as the coefficients, wald values, and p-values. This was also a challenge, and I believe there is likely a much more efficient and less involved way to do this. For each data set, I extracted the coefficient estimates, wald estimates, and p-values using the outputs of the model and combined them into a character or numeric string. I had issues with coercing the values to contain up to 3 decimals, and in some cases I had to manually input values into the string. Finally, the string was combined into a data frame incorporating all of the descriptive components of the analysis. Once in a clean and accurate dataframe, this was then passed on to {kableExtra} which was used to stylize the table. row_spec was used to bold significant results and italicize each of the components of the analysis.
Estimate = c(geeAnnual$coefficients[1],geeAnnual$coefficients[2],geeAnnual$coefficients[3],geeAnnual$coefficients[4],geeAnnual$coefficients[5],geeAnnual$coefficients[6])
Estimate = as.numeric(Estimate)
Estimate <- format(Estimate, digits=3)
Estimate[1] <- "6.87"
EstimateHr = c(geeMonthlyHr$coefficients[1],geeMonthlyHr$coefficients[2],geeMonthlyHr$coefficients[3],geeMonthlyHr$coefficients[4],geeMonthlyHr$coefficients[5])
EstimateHr = formatC(EstimateHr, format = "e", digits = 3)
EstimateHr[1] <-"2.720"
EstimateDT = c(geeMonthlyDT$coefficients[1],geeMonthlyDT$coefficients[2],geeMonthlyDT$coefficients[3],geeMonthlyDT$coefficients[4],geeMonthlyDT$coefficients[5])
EstimateDT = formatC(EstimateDT, format = "e", digits = 3)
EstimateDT[1] <-"1.942"
EstimateEven = c(geeMonthlyEvenness$coefficients[1],geeMonthlyEvenness$coefficients[2],geeMonthlyEvenness$coefficients[3],geeMonthlyEvenness$coefficients[4],geeMonthlyEvenness$coefficients[5])
EstimateEven = formatC(EstimateEven, format = "e", digits = 3)
EstimateEven[1] <-"4.830"
EstimateFGC = c(geeMonthlyFGC$coefficients[1],geeMonthlyFGC$coefficients[2],geeMonthlyFGC$coefficients[3],geeMonthlyFGC$coefficients[4],geeMonthlyFGC$coefficients[5])
EstimateFGC = formatC(EstimateFGC, format = "e", digits = 3)
EstimateFGC[1] <-"4.584"
Wald = c(summary(geeAnnual)$coefficients[1,3],summary(geeAnnual)$coefficients[2,3],summary(geeAnnual)$coefficients[3,3],summary(geeAnnual)$coefficients[4,3],summary(geeAnnual)$coefficients[5,3],summary(geeAnnual)$coefficients[6,3])
Wald = as.numeric(Wald)
Wald <- format(Wald, digits=3)
WaldHR = c(summary(geeMonthlyHr)$coefficients[1,3],summary(geeMonthlyHr)$coefficients[2,3],summary(geeMonthlyHr)$coefficients[3,3],summary(geeMonthlyHr)$coefficients[4,3],summary(geeMonthlyHr)$coefficients[5,3])
WaldHR = as.numeric(WaldHR)
WaldHR <- format(WaldHR, digits=3)
WaldDT = c(summary(geeMonthlyDT)$coefficients[1,3],summary(geeMonthlyDT)$coefficients[2,3],summary(geeMonthlyDT)$coefficients[3,3],summary(geeMonthlyDT)$coefficients[4,3],summary(geeMonthlyDT)$coefficients[5,3])
WaldDT = as.numeric(WaldDT)
WaldDT <- format(WaldDT, digits=3)
WaldEven = c(summary(geeMonthlyEvenness)$coefficients[1,3],summary(geeMonthlyEvenness)$coefficients[2,3],summary(geeMonthlyEvenness)$coefficients[3,3],summary(geeMonthlyEvenness)$coefficients[4,3],summary(geeMonthlyEvenness)$coefficients[5,3])
WaldEven = as.numeric(WaldEven)
WaldEven <- format(WaldEven, digits=3)
WaldFGC = c(summary(geeMonthlyFGC)$coefficients[1,3],summary(geeMonthlyFGC)$coefficients[2,3],summary(geeMonthlyFGC)$coefficients[3,3],summary(geeMonthlyFGC)$coefficients[4,3],summary(geeMonthlyFGC)$coefficients[5,3])
WaldFGC = as.numeric(WaldFGC)
WaldFGC <- format(WaldFGC, digits=3)
P = c(summary(geeAnnual)$coefficients[1,4],summary(geeAnnual)$coefficients[2,4],summary(geeAnnual)$coefficients[3,4],summary(geeAnnual)$coefficients[4,4],summary(geeAnnual)$coefficients[5,4],summary(geeAnnual)$coefficients[6,4])
P = as.numeric(P)
P <- format(P, digits=3)
P <- c("0.053","0.049","0.011","0.740","0.019","0.336")
PHR = c(summary(geeMonthlyHr)$coefficients[1,4],summary(geeMonthlyHr)$coefficients[2,4],summary(geeMonthlyHr)$coefficients[3,4],summary(geeMonthlyHr)$coefficients[4,4],summary(geeMonthlyHr)$coefficients[5,4])
PHR = c(formatC(PHR[1:3], format = "g", digits = 3), formatC(PHR[4:5], format = "f", digits = 3))
PHR[1:3] = "<0.001"
PDT = c(summary(geeMonthlyDT)$coefficients[1,4],summary(geeMonthlyDT)$coefficients[2,4],summary(geeMonthlyDT)$coefficients[3,4],summary(geeMonthlyDT)$coefficients[4,4],summary(geeMonthlyDT)$coefficients[5,4])
PDT = c(formatC(PDT[1], format = "g", digits=3),formatC(PDT[2:5], format = "f", digits=3))
PDT[1] = "<0.001"
PEven = c(summary(geeMonthlyEvenness)$coefficients[1,4],summary(geeMonthlyEvenness)$coefficients[2,4],summary(geeMonthlyEvenness)$coefficients[3,4],summary(geeMonthlyEvenness)$coefficients[4,4],summary(geeMonthlyEvenness)$coefficients[5,4])
PEven = c(formatC(PEven[1:3], format = "g", digits=3),formatC(PEven[4:5], format = "f", digits=3))
PEven[1:3] = "<0.001"
PGC = c(summary(geeMonthlyFGC)$coefficients[1,4],summary(geeMonthlyFGC)$coefficients[2,4],summary(geeMonthlyFGC)$coefficients[3,4],summary(geeMonthlyFGC)$coefficients[4,4],summary(geeMonthlyFGC)$coefficients[5,4])
PGC[1:3] = "<0.001"
PGC[4] = "0.207"
PGC[5] = "0.557"
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.1.2
GEETable <- data.frame(
Dependent_Variable = c("Home range (annual)","Intercept","Group size","Group size^2","Cumulative rainfall","Rainfall evenness","Average maximum temperature","Home range (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature","Average daily distance traveled (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature","Evenness of space use (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature","Fecal glucocorticoids (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature") ,
Estimate = c("",Estimate,"",EstimateHr,"",EstimateDT,"",EstimateEven,"",EstimateFGC),
Wald = c("",Wald,"",WaldHR,"",WaldDT,"",WaldEven,"",WaldFGC),
P = c("",P,"",PHR,"",PDT,"",PEven,"",PGC)
)
GeeKable<- GEETable %>%
kbl(caption = "Results from GEEs") %>%
kable_classic(full_width = F, html_font = "Cambria")
GeeKable<- row_spec(GeeKable, row=c(3,4,6,10,11,17,22,23,28,29), bold = TRUE)
GeeKable<- row_spec(GeeKable, row=c(1,8,14,20,26), font_size=18, italic=TRUE)
GeeKable
| Dependent_Variable | Estimate | Wald | P |
|---|---|---|---|
| Home range (annual) | |||
| Intercept | 6.87 | 4.132 | 0.053 |
| Group size | -1.47e-02 | 8.768 | 0.049 |
| Group size^2 | 1.63e-04 | 19.753 | 0.011 |
| Cumulative rainfall | 8.38e-05 | 0.574 | 0.740 |
| Rainfall evenness | -1.46e+00 | 7.682 | 0.019 |
| Average maximum temperature | -9.23e-02 | 1.038 | 0.336 |
| Home range (monthly) | |||
| Intercept | 2.720 | 17.4941 | <0.001 |
| Group size | -3.659e-02 | 19.7457 | <0.001 |
| Group size^2 | 3.168e-04 | 20.3584 | <0.001 |
| Cumulative rainfall | -1.824e-04 | 0.0929 | 0.760 |
| Average maximum temperature | 1.222e-02 | 0.5100 | 0.475 |
| Average daily distance traveled (monthly) | |||
| Intercept | 1.942 | 50.493 | <0.001 |
| Group size | -8.927e-03 | 3.192 | 0.074 |
| Group size^2 | 8.055e-05 | 3.918 | 0.048 |
| Cumulative rainfall | -3.205e-04 | 3.613 | 0.057 |
| Average maximum temperature | -5.554e-03 | 0.854 | 0.355 |
| Evenness of space use (monthly) | |||
| Intercept | 4.830 | 63.4442 | <0.001 |
| Group size | -2.199e-02 | 10.0817 | <0.001 |
| Group size^2 | 1.927e-04 | 10.9598 | <0.001 |
| Cumulative rainfall | 1.510e-04 | 0.1325 | 0.716 |
| Average maximum temperature | -3.981e-03 | 0.0687 | 0.793 |
| Fecal glucocorticoids (monthly) | |||
| Intercept | 4.584 | 812.083 | <0.001 |
| Group size | -8.317e-03 | 7.247 | <0.001 |
| Group size^2 | 6.835e-05 | 11.787 | <0.001 |
| Cumulative rainfall | 2.323e-04 | 1.593 | 0.207 |
| Average maximum temperature | -2.293e-03 | 0.344 | 0.557 |
Table 1
Overall, my replication was successful in reproducing the results of the study by Markham et al. (2015). The authors found that, interestingly, the relationship between group size and variables such as home range size, daily travel distance, evenness of space use, and fecal glucocorticoid concentrations (fGC) was best modeled as a quadratic with a U-shaped relationship. Intermediate groups had the optimum group size and thus displayed the lowest home range sizes, daily travel distances, evenness of space use, and fecal glucocorticoid concentrations (fGC). In contrast, both smaller and larger groups had greater values for each of these variables. Energetically, this suggests that both smaller and larger groups suffer the cost of traveling farther while foraging. Regarding health, non-intermediate groups may experience higher glucocorticoid concentrations as a result of increased activity or higher stress. This fits with the literature suggesting greater within-group competition for larger groups, but why do smaller groups also need to travel more? The authors posit that small groups may be at a disadvantage regarding between-group competition with other baboons and as such may not gain access to preferred food resources. In addition, small groups may also be at a greater risk of predation which could also cause displacement from resource rich areas. Markham et al. (2015) discuss that while some primates in fission-fusion societies may experience fluctuating group sizes, baboons typically have stable group sizes with more or less the same individuals comprising a group for long periods of time. As a result, group size may have long lasting fitness consequences for baboons. Finally, they consider that this mechanism of optimal group size may explain the versatility and persistence of baboons in a variety of habitats.
Undoubtedly, this study produced valuable and interesting results that helped change our understanding of group size and its predictors. However, my replication has revealed that reproducibility in science is strongly hindered both by standardization of statistical methods and specificity of statistical components. Regarding the former, SPSS is an expensive program that is not open source which limits the ability of individuals such as myself to completely and accurately reproduce the data in this study. Further, I would argue that the consensus for most academic researchers is to use R for statistical methods. Having a consensus such as this ideally offers an optimal environment for reproducing science, as once someone has learned a given programming language such as R they can at least try their hand at replicating any study done via R. Regarding the latter, the exact specifications of the models were not completely clear. For example, the authors did not include how the bivariate relationships were assessed and plotted. More attention should be placed in explicitly laying out statistical methods, even if they seem self-explanatory. Mic drop
Ezenwa, V. O., Ghai, R. R., McKay, A. F., & Williams, A. E. (2016). Group living and pathogen infection revisited. Current Opinion in Behavioral Sciences, 12, 66-72.
Hamilton, W. D. (1971). Geometry for the selfish herd. Journal of theoretical Biology, 31(2), 295-311.
Majolo, B., de Bortoli Vizioli, A., & Schino, G. (2008). Costs and benefits of group living in primates: group size effects on behaviour and demography. Animal Behaviour, 76(4), 1235-1247.